Inferring population mutation rate and sequencing error rate using the SNP frequency spectrum in a sample of DNA sequences.
نویسندگان
چکیده
One challenge of analyzing samples of DNA sequences is to account for the nonnegligible polymorphisms produced by error when the sequencing error rate is high or the sample size is large. Specifically, those artificial sequence variations will bias the observed single nucleotide polymorphism (SNP) frequency spectrum, which in turn may further bias the estimators of the population mutation rate theta =4N mu for diploids. In this paper, we propose a new approach based on the generalized least squares (GLS) method to estimate theta, given a SNP frequency spectrum in a random sample of DNA sequences from a population. With this approach, error rate epsilon can be either known or unknown. In the latter case, epsilon can be estimated given an estimation of theta. Using coalescent simulation, we compared our estimators with other estimators of theta. The results showed that the GLS estimators are more efficient than other theta estimators with error, and the estimation of epsilon is usable in practice when the theta per bp is small. We demonstrate the application of the estimators with 10-kb noncoding region sequence sampled from a human population and provide suggestions for choosing theta estimators with error.
منابع مشابه
A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data
MOTIVATION Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty. RE...
متن کاملI-37: Establishing High Resolution Genomic Profiles of Single Cells Using Microarray and Next-Generation Sequencing Technologies
The nature and pace of genome mutation is largely unknown. Standard methods to investigate DNA-mutation rely on arraying or sequencing DNA from a population of cells, hence the genetic composition of individual cells is lost and de novo mutation in cell(s) is concealed within the bulk signal. We developed methods based on (SNP-) arraying and next-generation sequencing of single-cell whole-genom...
متن کاملThe Spectrum of Mutations in 100 Thalassemic Carriers Referred to Ghaem Hospital of Mashhad
Abstract Background Thalassemia is common in the Iranian population, and it must be considered in the differential diagnosis of the microcytic hypochromic anemia. The molecular analysis of β-thalassemia is necessary for prenatal molecular diagnosis. Α-thalassemia caused by loss of function of either one of the two duplicated α-globin genes or in less frequent non deletion mutations mostly loc...
متن کاملStudy of frequency and spectrum of GJB2 gene mutations in non-syndromic hearing loss patients of Semnan province
Abstract Background and aim: The frequency of hearing impairment is one out of 500 newborn babies, worldwide. However, in Iran, due to the high prevalence of consanguineous marriages, this amount is estimated to be two to three times higher. So far, more than 120 genes causing non-syndromic Hearing loss (NSHL) have been identified in the world, of which GJB2 gene mutations are the most common c...
متن کاملScreening for FecGH Mutation of Growth Differentiation Factor 9 Gene in Iranian Ghezel Sheep Population
Background Ghezel sheep are highly prolific and one of the local sheep breeds in Iran and Turkey. Growth differentiation factor-9 (GDF9) gene has been found to be essential for growth and differentiation of early ovarian follicles. Novel mutations in GDF9 have been associated with increased ovulation rates and high litter sizes in heterozygous carriers. Therefore, fecundity gene for GDF9 (FecGH...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Molecular biology and evolution
دوره 26 7 شماره
صفحات -
تاریخ انتشار 2009